Picture for Yixu Wang

Yixu Wang

SentGuard: Sentence-Level Streaming Guardrails for Large Language Models

Add code
Jun 01, 2026
Viaarxiv icon

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Add code
May 31, 2026
Viaarxiv icon

Towards Context-Invariant Safety Alignment for Large Language Models

Add code
May 20, 2026
Viaarxiv icon

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Add code
Jan 16, 2026
Viaarxiv icon

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Add code
Jan 04, 2026
Viaarxiv icon

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Add code
Nov 16, 2025
Viaarxiv icon

A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports

Add code
Oct 02, 2025
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Figure 1 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 2 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 3 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 4 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Viaarxiv icon

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

Add code
May 26, 2025
Viaarxiv icon

SafeVid: Toward Safety Aligned Video Large Multimodal Models

Add code
May 17, 2025
Viaarxiv icon